The purpose of this script is (a) to find out which parameters produce the best-fit LISFLOOD model of the 100-year floodplain in Sonoma County, and (b) to determine which parameters the results are sensitive to.

## setup information
source('_data/setup.R')

num_cores <- 5

load('_data/lisflood/dem.Rdata')

require(ggpubr)
require(grid)
require(gridExtra)
require(gt)

1. Generate LISFLOOD simulations

The first goal is to define the parameters of interest and calculate inundation maps for those sets of parameters. We generate 1,000 Latin hypercube samples of 20 uniform variables and transform them to match the distributions of the parameters under consideration. We run the script \(generate_files.sh\) create .bci, .bdy, .n.asc, and .par files based on these samples. The samples and associated files are then run through LISFLOOD to produce 1,000 inundation maps using the script \(run_lisflood.sh\). All of these files can be found in the rp100/sherlock folder. An explanation of the various files and the simulation process can be found in the LISFLOOD documentation.

2. Determine accuracy metrics

The next goal is to determine some measure of accuracy, i.e. how well the LISFLOOD simulations are able to recreate the “true” case. In this case the data used as a source of truth is the Federal Emergency Management Program (FEMA) 100-year floodplain as downloaded from the National Flood Hazard Layer (NFHL). We estimated the peak inflow at USGS gage 11463500 to be \(Q_p = 112,000 \; cfs = 3,171 \; m^3/s\) using the USGS StreamStats tool. This value is fixed, and several other parameters are varied to find the best-fit LISFLOOD model. We consider twenty different parameters that can be changed in the LISFLOOD model to modify the resulting inundation map. A table of these parameters and the values considered for each parameter is included below.

We then compare these 1,000 simulated inundation maps to the FEMA NFHL using two different accuracy metrics: the critical success ratio (fstat) and the Hausdorff distance (hausdorff). The critical success ratio is a the ratio of correctly predicted flood cells (i.e. true positives) over all flood cells (i.e. true positives + true negatives + false positives). This produces a balanced measure of over- vs. under-prediction. The Hausdorff distance is a spatial measure of how well the shape of the simulated floodplain matches the shape of the FEMA NFHL. It is important to note that both of these accuracy metrics are meant to measure the goodness-of-fit of binary (i.e. wet/dry) outcomes. While LISFLOOD outputs a flood depth at every location, the NFHL data does not provide this information, and therefore all validation was performed only on the quality of the fit between the “true” vs. simulated 100-year flood extents.

#### compute accuracy statistics for each inundation map ####

## load NFHL "observed" data
load('_data/NFHL/NFHL.Rdata')

## load simulated data
sim.list <- list.files('_sensitivity/rp100/sherlock/results') %>% 
  gsub('rpflow', '', .) %>% gsub('.max', '', .) %>% toNumber
  
## write a function to generate the reduced confusion matrix
confusion <- function(obs, sim) {
  ifelse(obs > 0 & sim > 0, 0, 
         ifelse(obs > 0 & sim <= 0, -1, 
                ifelse(obs <= 0 & sim > 0, 1, NA)))
}

## write a function to collapse simulation into wet/dry binary
binary <- function(x) {
  mat <- as.matrix(x) 
  mat <- ifelse(is.na(mat[]), 0, ifelse(mat[]>0, 1, 0))
  return(mat)
}

# ## compute accuracy metrics
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = length(sim.list), style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# accuracy <-
#   foreach(i = sim.list, .inorder = FALSE,
#     .combine = 'rbind', .export = c('confusion', 'binary'),
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .packages = c('raster', 'dplyr', 'pracma')) %dopar% {
#       sim <- paste0('_sensitivity/rp100/sherlock/results/rpflow', i, '.max') %>%
#         raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
# 
#       ## calculate confusion matrix statistics
#       tb <- overlay(obs, sim, fun = confusion)[] %>% table
# 
#       ## calculate Hausdorff distance
#       hd <- hausdorff_dist(binary(sim), as.matrix(obs))
# 
#       ## save metrics as dataframe
#       c(id = i,
#         hitrate = unname(tb['0'] / (tb['-1'] + tb['0'])),
#         falsalarm = unname(tb['1'] / (tb['0'] + tb['1'])),
#         fstat = unname(tb['0'] / sum(tb)),
#         bias = unname(tb['1'] / tb['-1']),
#         hausdorff = hd)
#     }
# stopCluster(cl)
# Sys.time() - start
# 
# ## save checkpoint
# save(accuracy, file = '_sensitivity/rp100/checkpoints/accuracy.Rdata')
load('_sensitivity/rp100/checkpoints/accuracy.Rdata')

## join to samples dataframe
samples.rp100 <- read.table('_sensitivity/rp100/sherlock/samples_rp100.txt', header = TRUE) 
vars <- names(samples.rp100)
samples.rp100 <- samples.rp100 %>%
  mutate(id = 1:nrow(.)) %>% 
  left_join(data.frame(accuracy), by = 'id')
LISFLOOD Sensitivity Testing
Parameters Values1
Name Description Min Max
Floodplain Roughness Parameters
95 Emergent Herbaceous Wetlands 0.035 0.12
90 Woody Wetlands 0.045 0.15
82 Cultivated Crops 0.020 0.10
81 Pasture/Hay 0.025 0.090
71 Grassland/Herbaceous 0.025 0.070
52 Shrub/Scrub 0.050 0.16
43 Mixed Forest 0.080 0.20
42 Evergreen Forest 0.080 0.18
41 Deciduous Forest 0.10 0.20
31 Barren Land 0.023 0.10
24 Developed - High Intensity 0.12 0.20
23 Developed - Medium Intensity 0.060 0.16
22 Developed - Low Intensity 0.060 0.12
21 Developed - Open Space 0.030 0.090
11 Open Water 0.020 0.050
Channel Parameters
SGCn Channel Roughness Parameter 0.015 0.075
SGCr Channel Shape Parameter 0.010 0.15
edge Inlet Channel Width (m) 10 25
Hydrograph Parameters
tp Time to Peak Flow (hrs) 0 200
m Hydrograph Shape Parameter 0 10

1 All parameters are assumed to be uniformly distributed.

3. attempt model falsification

The third goal is to attempt model falsification. This process helps us to understand whether or not the parameter ranges we have chosen are reasonable, and whether the LISFLOOD model with the chosen parameter ranges is even capable of reproducing the FEMA NFHL under best-case circumstances. We do this with multi-dimensional scaling (MDS). We create a matrix where the “distance” (inverse of accuracy) is measured between every pair of simulations, then we reduce the dimensionality of that matrix so that it can be displayed on a 2D plot. If the point representing the FEMA NFHL map falls outside of the point cloud representing the suite of LISFLOOD simulations, then the parameter ranges and the model are such that it will never be able to perfectly reproduce the “true” case, and the model is falsified. If the point representing the FEMA NFHL map falls within the simulation point cloud then the model is deemed acceptable.

As mentioned in the previous section, we have 1,000 LISFLOOD simulations. There is substantial similarity between the inundation maps for these simulations, and creating a \(1000 \times 1000\) distance matrix is computationally expensive. Therefore we first use k-means clustering to identify 100 representative simulations and perform the MDS falsification process with that reduced set.

#### cluster points to reduce computational demand ####

## run k-means clustering algorithm
n <- 100
# samples.k <- samples.rp100 %>% 
#   filter(complete.cases(.)) %>% 
#   mutate(bias.odds = bias / (1 + bias),
#          hausdorff = hausdorff/max(hausdorff)) %>% 
#   dplyr::select(hitrate, falsalarm, fstat, bias.odds, hausdorff) %>% 
#   kmeans(centers = n, iter.max = 1e3, nstart = 100, trace = FALSE)
# samples.rp100$mds.cluster[complete.cases(samples.rp100)] <- samples.k$cluster
# 
# ## find the best-fitting simulation within each cluster
# cluster.id <- samples.rp100 %>% 
#   mutate(test = hausdorff/Max(hausdorff) + (1-fstat)) %>% 
#   group_by(mds.cluster) %>% 
#   summarize(id = id[which.min(test)], .groups = 'drop') %>% pull(id)
# 
# ## save checkpoint
# save(samples.rp100, cluster.id, file = '_sensitivity/rp100/checkpoints/samples_cluster.Rdata')
load('_sensitivity/rp100/checkpoints/samples_cluster.Rdata')
#### create distance matrix based on fstat/critical success ratio ####

# ## generate matrix
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = n, style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# fstat <-
#   foreach(i = 1:n,
#     .combine = 'rbind', .export = 'confusion',
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .packages = c('raster', 'dplyr', 'pracma', 'foreach')) %dopar% {
#       sim.i <- paste0('_sensitivity/rp100/sherlock/results/rpflow', cluster.id[i], '.max') %>%
#         raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#       foreach(j = 1:n, .combine = 'c') %do% {
#         if (i >= j) 0 else {
#           sim.j <- paste0('_sensitivity/rp100/sherlock/results/rpflow', cluster.id[j], '.max') %>%
#             raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#           tb <- overlay(sim.i, sim.j, fun = confusion)[] %>% table
#           1 - tb['0']/sum(tb)
#         }
#       }
#     }
# stopCluster(cl)
# Sys.time() - start
# 
# ## add "observed" NFHL point
# fstat <- fstat %>%
#   cbind(1-samples.rp100$fstat[cluster.id]) %>%
#   rbind(rep(0, n+1))
# 
# ## format matrix for MDS
# fstat <- fstat + t(fstat)
# 
# ## save checkpoint
# save(fstat, file = '_sensitivity/rp100/checkpoints/fstat.Rdata')
load('_sensitivity/rp100/checkpoints/fstat.Rdata')
#### create distance matrix based on hausdorff distance ####

# ## generate matrix
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = n, style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# hausdorff <-
#   foreach(i = 1:n,
#     .combine = 'rbind', .export = 'binary',
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .packages = c('raster', 'dplyr', 'pracma', 'foreach')) %dopar% {
#       sim.i <- paste0('_sensitivity/rp100/sherlock/results/rpflow', cluster.id[i], '.max') %>%
#         raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#       foreach(j = 1:n, .combine = 'c') %do% {
#         if (i >= j) 0 else {
#           sim.j <- paste0('_sensitivity/rp100/sherlock/results/rpflow', cluster.id[j], '.max') %>%
#             raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#           hausdorff_dist(binary(sim.i), binary(sim.j))
#         }
#       }
#     }
# stopCluster(cl)
# Sys.time() - start
# 
# ## add "observed" NFHL point
# hausdorff <- hausdorff %>%
#   cbind(samples.rp100$hausdorff[cluster.id]) %>%
#   rbind(rep(0, n+1))
# 
# ## format matrix for MDS
# hausdorff <- hausdorff + t(hausdorff)
# 
# ## save checkpoint
# save(hausdorff, file = '_sensitivity/rp100/checkpoints/hausdorff.Rdata')
load('_sensitivity/rp100/checkpoints/hausdorff.Rdata')
#### report results of multi-dimensional scaling (MDS) ####

## perform MDS computation
mds.fstat <- cmdscale(fstat, eig = TRUE, k = n/2)
mds.hd <- cmdscale(hausdorff, eig = TRUE, k = n/2)

## scale eigenvalues to represent % variance explained
mds.fstat$var <- abs(mds.fstat$eig)/sum(abs(mds.fstat$eig))
mds.hd$var <- abs(mds.hd$eig)/sum(abs(mds.hd$eig))

## determine how many dimensions it requires to get to 90% of variance explained
mds.fstat$dim90 <- min(which(cumsum(mds.fstat$var) > 0.9))
mds.hd$dim90 <- min(which(cumsum(mds.hd$var) > 0.9))
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font metrics
## unknown for character 0x4d
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x67
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x6a
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x70
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x71
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x79
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x51
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font metrics
## unknown for character 0x4d
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x67
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x6a
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x70
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x71
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x79
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x51
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font metrics
## unknown for character 0x4d
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x67
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x6a
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x70
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x71
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x79
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font width
## unknown for character 0x51

In the plots above, we see that the FEMA NFHL point falls within the range of and/or along the line of LISFLOOD simulations using both the critical success ratio and the Hausdorff distance as metrics. Therefore we are unable to falsify the model and we assume that the parameter ranges we have chosen are reasonable.

4. examine parameter sensitivity

The next task to determine which of the twenty parameters of interest actually have a significant effect on the resulting inundation maps using regional sensitivity analysis (RSA). The RSA process is as follows:

We use the Hausdorff MDS distance matrix for this analysis, and we repeat the RSA process for a variety of \(k\) values. The results are shown in the plots below.

#### write L1-norm bootstrap function ####

L1_boot <- function(data, col, cl, var, boot = 1e3) {
  # data: dataframe with parameter values & clustering information
  # col: name of clustering column
  # cl: cluster number to consider
  # var: parameter to consider
  # boot: number of bootstrapping samples to draw
  
  x <- sort(data[data[,col] == cl, var])
  x.all <- sort(data[,var])
  p <- (1:length(x))/(length(x)+1)
  p.all <- (1:length(x.all))/(length(x.all)+1)
  x.interp <- interp1(x = p.all, y = x.all, xi = p)
  
  if (is.na(boot)) {
    return(cbind(x, x.interp) %>% apply(1, diff) %>% abs %>% sum)
  } else {
    L1 <- 
      foreach (b = 1:boot, .combine = 'c') %do% {
        sample(x.all, size = length(x), replace = TRUE) %>% sort %>% 
          cbind(x.interp) %>% apply(1, diff) %>% abs %>% sum
      }
    return(L1)
  }
}
#### write RSA function ####

RSA <- function(k, plot = FALSE, data = FALSE) {
  # k: number of clusters to evaluate
  # plot: determines which diagnostic plots to output
        # (can be TRUE, FALSE, or any combination of c('k', 'var', 'rsa'))
  # data: determines whether to output the RSA hypothesis testing values
  
  ## perform k-means clustering on MDS distances
  kclust <- 
    mds.hd$points[-(n+1), 1:mds.hd$dim90] %>% 
    kmeans(centers = k, iter.max = 1000, nstart = 10)
  cluster.colors <- scico(palette = 'bamako', n = k+2)[2:(k+1)]
  
  ## attach MDS distances to parameter information
  df.cluster <- mds.hd$points[-(n+1), 1:2] %>% 
    as.data.frame %>% 
    setNames(c('x','y')) %>%
    mutate(id = cluster.id) %>% 
    left_join(samples.rp100, by = 'id') %>% 
    mutate(rsa.cluster = factor(kclust$cluster))
  
  ## generate 95% confidence intervals for every parameter & cluster
  cl <- parallel::makeCluster(num_cores)
  registerDoSNOW(cl)
  d <- 
    foreach(cl = 1:k, .combine = 'rbind', 
      .export = 'L1_boot', .packages = c('dplyr', 'pracma', 'foreach'), .inorder = FALSE) %:%
    foreach(var = vars, .combine = 'rbind', .inorder = FALSE) %dopar% {
      data.frame(
        cl = cl, var = var, 
        d = L1_boot(df.cluster, 'rsa.cluster', cl, var, NA),
        d95 = L1_boot(df.cluster, 'rsa.cluster', cl, var) %>% quantile(0.95) %>% unname)
    }
  stopCluster(cl)
  
  ## standardize L1-norm coefficients 
  d <- d %>% mutate(d.norm = d/d95, pass = d.norm>1)
  
  ## return results
  plot.RSA(plot, k, df.cluster, d, cluster.colors)
  if (data) df.cluster <<- df.cluster; return(d)
}
#### perform regional sensitivity analysis (RSA) ####

## run RSA for different values of k
d2 <- RSA(2, plot = 'rsa', data = TRUE)

d3 <- RSA(3, plot = 'rsa', data = TRUE)

d5 <- RSA(5, plot = 'rsa', data = TRUE)

#### perform RSA sensitivity/stability testing ####

# ## run RSA for k=3 several times
# rsa.k3 <- map(.x = 1:10, .f = ~RSA(3, data = TRUE) %>%
#                 filter(pass) %>% pull(var) %>% unique)
# 
# ## run RSA for k=2:10
# rsa.ktest <- map(.x = 2:10, .f = ~RSA(.x, data = TRUE) %>%
#                    filter(pass) %>% pull(var) %>% unique)
# 
# ## save checkpoint 
# save(rsa.k3, rsa.ktest, file = '_sensitivity/rp100/checkpoints/rsa.Rdata')
# load('_sensitivity/rp100/checkpoints/rsa.Rdata')

important <- c('edge', 'SGCr', 'tp', 'X11', 'X22', 'X23', 'X71')
#### create DGSA interactions matrix ####

# ## create 3 MDS clusters
# k <- 3
# df <- RSA(k, data = TRUE)
# 
# ## find parameter interactions within those clusters
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = length(vars)^2, style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# interactions <- 
#   foreach (i = 1:length(vars), .combine = 'rbind', 
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .export = 'L1_boot', .packages = c('pracma', 'tidyverse', 'foreach')) %:%
#   foreach (j = 1:length(vars), .combine = 'c') %dopar% {
#     if (i == j) {
#       ## fill in the main diagonal
#       df %>% filter(var == vars[i]) %>% pull(d.norm) %>% max
# 
#     } else {
#       ## fill in the off-diagonals
#       foreach (cl = 1:k, .combine = 'max') %do% {
#         df.subset <- df.cluster %>% filter(rsa.cluster == cl)
#         df.subset$dgsa.cluster <- 
#           kmeans(df.subset %>% pull(vars[j]), 
#                  centers = 2, iter.max = 1000, nstart = 100)$cluster
#         d <- L1_boot(df.subset, 'dgsa.cluster', 1, vars[i], NA)
#         d95 <- L1_boot(df.subset, 'dgsa.cluster', 1, vars[i]) %>% quantile(0.95)
#         val.1 <- d/d95
#         d <- L1_boot(df.subset, 'dgsa.cluster', 2, vars[i], boot = NA)
#         d95 <- L1_boot(df.subset, 'dgsa.cluster', 2, vars[i]) %>% quantile(0.95)
#         val.2 <- d/d95
#         max(val.1, val.2)
#       }
#     }
#   }
# stopCluster(cl)
# Sys.time() - start
# 
# ## save checkpoint
# save(interactions, file = '_sensitivity/rp100/checkpoints/interactions.Rdata')
load('_sensitivity/rp100/checkpoints/interactions.Rdata')

make some conclusions here re: which variables we actually care about

4. choose best-fit simulation

## figure out which points are closest in MDS space
hd.dist <- map_dbl(.x = 1:n, 
  .f = ~sqrt(sum((mds.hd$points[.x,1:mds.hd$dim90] - mds.hd$points[n+1,1:mds.hd$dim90])^2)))
g <- ggplot(data.frame(mds.hd$points[,1:2]) %>% 
         setNames(c('x', 'y')) %>% 
         mutate(num = 1:(n+1)) %>% 
         mutate(dist = c(hd.dist, 0))) +
  geom_point(aes(x=x, y=y, size = num==n+1, color = dist), show.legend = FALSE) + 
  scale_color_scico(palette = 'davos') + 
  scale_size_manual(values = c(2, 3)) + 
  labs(x = paste0('Dimension 1 (', percent(mds.hd$var[1], accuracy = 0.1), ')'),
       y = paste0('Dimension 2 (', percent(mds.hd$var[2], accuracy = 0.1), ')')) +
  ggtitle('Hausdorff Distance: MDS Plot')
plotly::ggplotly(g)  
## subset to the top 5% of clusters (~50 samples)
temp <- data.frame(mds.hd$points[,1:2]) %>% 
  setNames(c('x', 'y')) %>% 
  mutate(num = 1:(n+1)) %>% 
  filter(num <= n) %>% 
  mutate(dist = hd.dist) %>% 
  dplyr::select(num, dist) %>% 
  left_join(samples.rp100, ., by = c('mds.cluster' = 'num'))
bestfit.id <- temp %>% 
  arrange(dist) %>% 
  filter(mds.cluster %in% unique(mds.cluster[1:50])) %>% 
  pull(id)

#### create distance matrix based on fstat/critical success ratio ####

n <- length(bestfit.id)
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = n, style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# fstat2 <-
#   foreach(i = 1:n,
#     .combine = 'rbind', .export = 'confusion',
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .packages = c('raster', 'dplyr', 'pracma', 'foreach')) %dopar% {
#       sim.i <- paste0('_sensitivity/rp100/sherlock/results/rpflow', bestfit.id[i], '.max') %>%
#         raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#       foreach(j = 1:n, .combine = 'c') %do% {
#         if (i >= j) 0 else {
#           sim.j <- paste0('_sensitivity/rp100/sherlock/results/rpflow', bestfit.id[j], '.max') %>%
#             raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#           tb <- overlay(sim.i, sim.j, fun = confusion)[] %>% table
#           1 - tb['0']/sum(tb)
#         }
#       }
#     }
# stopCluster(cl)
# Sys.time() - start
# 
# fstat2 <- fstat2 %>%
#   cbind(1-samples.rp100$fstat[bestfit.id]) %>%
#   rbind(rep(0, n+1))
# fstat2 <- fstat2 + t(fstat2)
# 
# #### create distance matrix based on hausdorff distance ####
# start <- Sys.time()
# # pb <- txtProgressBar(min = 0, max = n, style = 3)
# cl <- parallel::makeCluster(num_cores)
# registerDoSNOW(cl)
# hausdorff2 <-
#   foreach(i = 1:n,
#     .combine = 'rbind', .export = 'binary',
#     # .options.snow = list(progress = function(n) setTxtProgressBar(pb, n)),
#     .packages = c('raster', 'dplyr', 'pracma', 'foreach')) %dopar% {
#       sim.i <- paste0('_sensitivity/rp100/sherlock/results/rpflow', bestfit.id[i], '.max') %>%
#         raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#       foreach(j = 1:n, .combine = 'c') %do% {
#         if (i >= j) 0 else {
#           sim.j <- paste0('_sensitivity/rp100/sherlock/results/rpflow', bestfit.id[j], '.max') %>%
#             raster %>% overlay(dem.hydro, fun = function(x,y) ifelse(is.na(y), NA, x))
#           hausdorff_dist(binary(sim.i), binary(sim.j))
#         }
#       }
#     }
# stopCluster(cl)
# Sys.time() - start
# 
# hausdorff2 <- hausdorff2 %>%
#   cbind(samples.rp100$hausdorff[bestfit.id]) %>%
#   rbind(rep(0, n+1))
# hausdorff2 <- hausdorff2 + t(hausdorff2)
# 
# ## save checkpoint
# save(fstat2, hausdorff2, file = '_sensitivity/rp100/checkpoints/distances2.Rdata')
load('_sensitivity/rp100/checkpoints/distances2.Rdata')
## perform multi-dimensional scaling
mds.fstat2 <- cmdscale(fstat2, eig = TRUE, k = 5)
mds.hd2 <- cmdscale(hausdorff2, eig = TRUE, k = 5)

## scale eigenvalues to represent % variance explained
mds.fstat2$var <- abs(mds.fstat2$eig)/sum(abs(mds.fstat2$eig))
mds.hd2$var <- abs(mds.hd2$eig)/sum(abs(mds.hd2$eig))

# bestfit.dist <- map_dbl(.x = 1:n, .f = ~sqrt(sum((mds.hd$points[.x,] - mds.hd$points[n+1,])^2)))
# 
# samples.rp100[best.id,] %>% View